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FEATURE  EXTRACTION  FOR  BEARING  PROGNOSTICS  AND 
HEALTH  MANAGEMENT  (PHM)  -  A  SURVEY 

Weizhonq  Yan,  Hai  Qiu,  and  Naresh  Iyer 


Industrial  Artificial  Intelligence  Lab 
GE  Global  Research  Center 
One  Research  Circle 
Niskayuna,  NY  12309 
{yan,  qiu,  &  lyerna}  @  crd.ge.com 


Abstract:  Feature  extraction  in  bearing  PHM  involves  extracting  characteristic  signatures  from 
the  original  sensor  measurements,  which  are  sensitive  to  bearing  condition  and  thus  most  useful 
in  determining  bearing  faults.  The  quality  of  extracted  features  directly  affects  the  performance 
and  the  effectiveness  of  bearing  PHM.  Feature  extraction  is  therefore  a  critical  component  in 
bearing  PHM.  To  optimally  improve  PHM  effectiveness  and  minimize  maintenance  costs  of 
bearings,  a  large  amount  research  has  been  conducted  in  extracting  salient  features  for  PHM, 
which  leads  to  a  considerable  number  of  feature  extraction  techniques.  Our  main  effort  in  this 
paper  is  to  survey  some  major  techniques  explored  so  far,  focusing  on  more  recent  advancements. 
Our  endeavor  also  includes  pointing  out  the  advantages  and  disadvantages  of  each  of  those 
techniques.  This  paper  attempts  to  serve  as  a  general  reference  for  bearing  PHM  practitioners  and 
as  a  general  guide  for  choosing  proper  feature  extraction  methods  for  bearing  PHM  systems. 

Keywords:  Bearing;  Feature  extraction;  Diagnosis;  Prognosis;  Health  Management;  Vibration 
Analysis 

1.  Introduction:  Prognostics  and  health  maintenance  (PHM)  is  a  new  maintenance  concept/ 
paradigm.  PHM  and  its  sibling,  condition-based  maintenance  (CBM),  are  the  result  of 
maintenance  industry’s  “paradigm  shift”  from  traditional  time-based  maintenance  to  support 
more  cost-effective  maintenance.  While  both  PHM  and  CBM  use  machinery  mn-time  data  to 
determine  asset  condition,  which  is  then  used  to  schedule  required  repair  and  maintenance  prior 
to  breakdown,  PHM  differs  CBM  in  that  PHM  has  the  capability  of  predicting  future  health, 
including  remaining  useful  life  (RUL)  [21].  It  is  this  predictive  capability  that  makes  PHM  most 
effective  in  reducing  operational  and  support  (O&S)  cost  and  life-cycle  total  ownership  cost 
(TOC). 

Bearings  are  one  of  the  most  common  components  in  modern  rotating  machinery.  Bearing 
failures  are  considered  to  be  the  leading  culprit  of  breakdowns  in  rotating  machinery  and  have 
resulted  in  a  significant  increase  of  O&S  cost.  As  a  tool  of  effectively  preventing  unexpected 
bearing  failures,  meanwhile  maximizing  bearing  uptime,  bearing  PHM  can  significantly  reduce 
O&S  cost  and  improves  safety  of  machinery.  As  a  result,  bearing  PHM  technologies  are  evolving 
rapidly  in  recent  years  and  have  attracted  tremendous  research  attention. 

Figure  1  illustrates  the  overall  structure  of  a  typical  bearing  PHM  system.  It  consists  of  three 
essential  modules,  namely,  sensing,  feature  identification,  and  PHM.  The  PHM  module  of 
bearing  PHM  systems  typically  consists  of  core  functions,  such  as,  anomaly  detection,  fault 
diagnosis,  prognosis,  and  decision-making.  Sensors  in  the  sensing  module  of  a  bearing  PHM 


1 


system  can  be  several  types,  including  vibration,  temperature,  chemical,  acoustic  emission,  and 
sound  pressure  [12].  For  real-world  PFIM  systems,  raw  sensor  measurements  are  rarely  used 
directly  by  PHM  functions.  Instead,  the  raw  sensor  measurements  are  preprocessed  (filtering,  de- 
noising  etc);  more  importantly,  signatures  are  always  extracted  from  the  raw  sensor 
measurements  and  those  signatures  that  are  most  sensitive  to  bearing  condition  and  thus  most 
useful  in  determining  bearing  faults  are  further  selected  for  PHM  functions.  So  preprocessing, 
feature  extraction,  and  feature  selection  functions  constitute  feature  ID  module  that  essentially 
“converts”  sensor  measurements  to  information  that  are  more  effective,  accurate  and  reliable  for 
PHM  functions.  Thus  feature  ID  module  has  been  regarded  as  a  critical  part  of  bearing  PHM 
systems. 

Identifying  salient  features  for  bearing  PHM  poses  challenges.  Firstly,  various  types  of  sensors 
with  different  characteristics  (data  type,  sampling  rate,  signal-to-noise  ratio,  etc)  may  be  involved 
in  a  bearing  PHM  system.  Identifying  salient  features  from  such  large  amount  of  sensory  data  can 
be  difficult.  Then,  individual  functions  (detection,  diagnosis,  and  prognosis)  of  the  PHM  module 
have  their  own  metrics  for  measuring  feature  goodness.  Feature  selection  in  the  feature  ID 
module  thus  needs  to  take  into  account  the  fact  that  a  set  of  features  that  are  good  for  one  PHM 
function  (e.g.,  diagnosis)  may  not  necessarily  be  good  for  another  (e.g.,  prognosis).  That  is, 
feature  selection  is  PHM  function-dependent. 


Figure  1 :  Overall  structure  of  a  typical  bearing  PHM  system 

Both  the  importance  and  the  challenges  of  identifying  salient  features  in  bearing  PHM  have 
inspired  great  research  interest,  thus  resulting  in  a  large  number  of  feature  extraction  methods. 
For  vibration  data  alone,  a  large  number  of  feature  extraction  methods  have  been  proposed, 
ranging  from  those  using  traditional  spectral  analysis,  to  those  using  wavelet  analysis  [23] [24],  to 
those  using  more  advanced  signal  processing  techniques  [15]  [18]  [2]. 

With  such  a  large  number  of  feature  extraction  methods  in  the  literature,  a  proper  categorization 
that  serves  as  an  overview  of  feature  extraction  methods  and  provides  a  general  guidance  on 
properly  choosing  FE  method  for  specific  applications  is  necessary.  However,  to  the  best  of  our 
knowledge,  such  categorization/review  work  devoted  specifically  to  feature  extraction  for  bearing 
PHM  has  not  been  done.  The  closest  work  in  this  context  would  be  [25],  where  a  subsection  of 
the  paper  was  devoted  to  summarizing  waveform  data  analysis  techniques.  This  paper  attempts  to 
survey  some  major  feature  extraction  techniques,  focusing  on  more  recent  development. 
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In  this  paper,  due  to  space  limitation,  we  limit  our  effort  to  vibration  signal  only.  In  other  words, 
we  focus  on  reviewing  vibration-based  feature  extraction  methods.  It  is  our  intention  to  publish  in 
a  separate  paper  on  a  more  comprehensive  review  of  techniques  associated  with  all  three  feature 
ID  functions  (preprocessing,  FE,  and  FS)  and  covering  all  types  of  bearing  sensor  measurements. 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  gives  some  fundamentals  of  bearings. 
Different  feature  extraction  methods  are  discussed  and  categorized  in  Section  3.  Section  4 
concludes  the  paper. 

2.  Bearing  fundamentals:  One  of  the  basic  purposes  of  a  bearing  is  to  provide  a  frictionless 
environment  to  support  and  guide  a  rotating  shaft.  There  are  many  different  ways  to  classify 
bearing  types,  based  on  their  application,  material,  and  lubrication  mechanism  etc.  Typically, 
bearings  can  be  classified  into  three  general  categories  based  on  their  construction:  fluid  film, 
rolling  element,  and  electromagnetic.  This  categorization  excludes  some  bearing  types,  such  as 
air  bearings,  which  are  only  used  in  special  applications.  Some  of  the  feature  extraction 
techniques  we  surveyed  in  this  paper  may  be  applicable  to  all  types  of  bearings.  Our  focus  in  this 
paper,  however,  is  on  rolling  element  bearings  only. 

Rolling  element  bearings  often  work  well  in  non-ideal  conditions,  but  sometimes  minor  problems 
cause  bearings  to  fail  quickly  and  mysteriously.  For  example,  with  a  stationary  load,  small 
vibrations  can  gradually  press  out  the  lubricant  between  the  races  and  rollers  or  balls,  and 
eventually  lead  to  bearing  failure  due  to  lack  of  lubrication. 

After  nearly  four  decades  of  studies  on  bearing  failure  mechanism,  the  theoretical  aspects  of 
bearing  failure  modes  are  a  well-undestood  subject.  There  are  three  usual  limits  to  the  lifetime  or 
load  capacity  of  a  bearing:  abrasion,  fatigue  and  pressure-induced  welding  [55]  [56].  Abrasion  is 
when  the  surface  is  eroded  by  hard  contaminants  scraping  at  the  bearing  materials.  Fatigue  is 
when  a  material  breaks  after  it  is  repeatedly  loaded  and  released.  Pressure-induced  welding  is 
when  two  metal  pieces  are  pressed  together  at  very  high  pressure  and  they  become  one. 

Although  there  are  many  other  apparent  causes  of  bearing  failure,  most  can  be  reduced  to  these 
three.  For  example,  a  bearing  which  is  run  dry  of  lubricant  fails  not  because  it  is  "without 
lubricant",  but  because  lack  of  lubrication  leads  to  fatigue  and  welding,  and  the  resulting  wear 
debris  can  cause  abrasion.  Similar  events  occur  in  false  brinelling  damage.  In  high  speed 
applications,  the  oil  flow  also  reduces  the  bearing  metal  temperature  by  convection.  The  oil 
becomes  the  heat  sink  for  the  friction  losses  generated  by  the  bearing. 

A  rolling  element  bearing  has  four  major  components,  outer  race,  inner  racer,  rolling  elements 
(ball,  roller,  needle  etc.),  and  cage.  All  four  components  might  be  damaged  during  operation. 
Generally,  the  signature  of  a  damaged  bearing  consists  of  exponentially  decaying  ringing  that 
occurs  periodically  at  the  characteristic  frequency.  The  vibration  signal  of  a  defective  bearing 
usually  considers  being  amplitude  modulated  at  characteristic  defect  frequency.  Matching  the 
measured  vibration  spectmm  with  the  defect  characteristic  frequency  enables  defect  detection  and 
enables  diagnosis  on  the  defective  area  [52]  [53]  [54]. 

3.  Feature  extraction  techniques:  There  is  no  unique  way  to  categorize  feature  extraction 
methods  used  for  bearing  PHM.  Figure  2  shows  our  taxonomy  of  vibration-based  feature 
extraction  methods.  The  primary  division  of  vibration-based  feature  extraction  methods  is  on 
weather  or  not  the  feature  extraction  method  can  deal  with  non-stationary  signals. 
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Vibration-based  feature  extraction 
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Figure  2:  Taxonomy  of  vibration-based  feature  extraction  methods 

3.1.  Stationary  signals:  Vibration  signals  acquired  from  bearings  can  be  either  stationary  or  non¬ 
stationary.  While  stationary  signals  are  characterized  by  time-invariant  statistical  properties,  such 
as  the  mean  value,  statistical  properties  of  a  non-stationary  signal  change  over  time.  Vibration 
signals  from  real-world  bearings  are  almost  always  non-stationary  since  bearings  are  inherently 
dynamic  (e.g.,  speed  and  load  condition  change  over  time).  However,  non-stationary  signals  are 
often  approximated  as  stationary,  especially  within  a  short  time  window,  for  computational 
convenience.  For  stationary  signals,  there  are  time-domain  and  frequency-domain  techniques  for 
feature  extraction. 

3.1.1.  Time  domain  techniques:  When  rolling  elements  of  bearing  pass  the  defect  location,  wide 
band  impulses  are  generated.  And  those  impulses  will  then  excite  some  of  the  vibrational  modes 
of  the  bearing  and  its  supporting  structure.  The  excitation  will  result  in  the  sensed  vibration 
signals  (waveforms)  different  in  either  the  overall  vibration  level  or  the  vibration  magnitude 
distribution,  comparing  to  those  under  fault-free  condition.  Time-domain  feature  extraction  is  to 
identify  the  signatures  from  the  sensed  time-domain  waveforms  (vibration  signals  and/or  acoustic 
emissions),  which  are  sensitive  to  bearing  conditions.  Depending  on  what  underlying  technology 
is  used,  time-domain  feature  extraction  techniques  can  be  further  categorized  into  three  groups: 
statistical-based,  model-based,  and  signal  processing-based  approaches,  all  three  of  which  are 
detailed  as  follows. 

a)  Statistical-based  approaches:  One  of  the  most  traditional  time-domain  feature  extraction 
methods  is  to  calculate  descriptive  statistics  of  vibration  signals,  including  those  measuring  power 
content  of  vibration  signals,  such  as  the  root  mean  square  (RMS);  those  measuring  signal 
magnitude  and  pattern,  such  as,  the  peaks,  the  peak-to-peak  intervals,  the  crest  factor;  and  those 
measuring  signal  distribution,  such  as,  the  mean  (F'  moment),  the  variance  (2"‘^  moment),  the 
skewness  (3'^'^  moment),  and  the  kurtosis(4*  moment).  Definitions  of  those  descriptive  statistics 
can  be  found  in  many  publications  (e.g.,  [28])  and  thus  will  not  be  provided  here. 

These  descriptive  statistics  can  be  calculated  directly  on  raw  signals.  However,  for  the  descriptive 
statistics  to  be  more  effective  in  bearing  condition  monitoring,  they  are  frequently  calculated  on 
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filtered  or  processed  signals.  In  [28],  the  descriptive  statistics  of  vibration  signals  were  calculated 
for  two  different  frequency  bands.  Realizing  signal  differences  and  sum  (integrals)  are  equivalent 
to  low-pass  and  high-pass  filtering,  respectively,  [6]  [5]  calculated  the  statistics  on  the  derivatives 
and  integrals  of  the  signals. 

b)  Model-based  approaches:  Model-based  feature  extraction  involves  treating  vibration  signals 
as  time  series  data  and  fitting  them  to  a  parametric  time  series  model.  The  model  parameters  are 
then  used  as  features.  The  most  popular  time  series  model  used  for  bearing  diagnosis  is  the 
autoregressive  (AR)  model.  Poyhonen  et  al.  [27]  applied  AR  model  to  vibration  signals  collected 
from  an  induction  motor  and  use  the  AR  model  coefficients  as  extracted  features.  Other  time- 
series  models,  such  as  the  autoregressive  moving  average  (ARMA)  and  other  nonlinear  models, 
such  as  neural  networks  and  support  vector  machines,  can  also  be  used. 

Baillie  and  Mathew  [8]  compared  three  different  autoregressive  models,  namely  linear 
autoregressive  models,  back-propagation  neural  networks,  and  radial  basis  function  networks, 
even  though  they  used  the  three  models  for  model-based  bearing  fault  diagnosis,  not  explicitly 
feature  extraction  purpose. 

Recent  direction  for  model-based  feature  extraction  seems  on  extending  model-based  approaches 
that  work  for  stationary  signals  to  non-stationary  signals.  For  example,  Chen  et  al  [26]  used 
empirical  mode  decomposition  (EMD)  to  decompose  the  non-stationary  signals  into  a  number  of 
intrinsic  mode  function  (IMF)  components  that  are  stationary.  An  AR  model  was  then  applied  to 
each  of  the  IMF  components. 

c)  Time-domain  DSP  approaches:  Classical  digital  signal  processing  includes  filtering, 
averaging,  correlation,  and  convolution.  Another  popular  DSP  technique  is  Synchronous 
averaging  [10].  More  recently,  several  techniques  rooted  in  chaos  theory  have  been  adapted  to 
feature  extraction.  For  example,  fractal  dimension  [1][2],  correlation  dimension  [14][15]. 

3.1.2.  Frequency  domain  techniques:  Time-domain  features  are  generally  considered  to  be  good 
for  fault  detection,  but  less  effective  for  fault  isolation,  i.e.,  to  determine  where  the  defect  is 
located,  inner  race,  outer  race,  rolling  elements,  and  cage.  For  fault  isolation,  frequency-domain 
features  are  generally  more  effective.  Frequency-domain  feature  extraction  methods  include 
spectral  analysis,  envelope  analysis,  cepstrum,  and  higher-order  spectra. 

a)  Spectral  Analysis:  The  most  popularly  used  method  is  the  spectral  analysis.  A  spectrum  (more 
practically  power  spectrum)  obtained  from  fast  Fourier  transform  (FFT)  of  a  vibration  signal 
represents  frequency  characteristics  of  the  signal.  Either  the  entire  spectrum  or  the  frequency 
amplitudes  at  the  bearing  characteristic  frequencies  calculated  from  the  power  spectrum  of 
vibration  signals  can  be  used  as  features. 

In  Li’s  work  [3],  to  consider  the  energy  leakage  (spreading  over  a  wide  frequency  band),  features 
were  calculated  as  the  sum  of  the  amplitudes  over  a  frequency  band  of  5Hz  centered  at  the  defect 
frequencies.  Instead  of  a  fixed  bandwidth,  Saleh  et  al  (2003)  further  proposed  variable  bands 
where  bandwidths  are  determined  as  a  percentage  of  the  interested  frequencies.  The  features  at 
the  interested  frequencies  were  then  calculated  either  as  the  average  value  of  the  banded 
components,  or  the  maximum  value  of  the  banded  components,  or  the  energy  within  the 
frequency  bands. 
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Descriptive  statistics  of  spectra  can  also  be  used  as  features.  For  example,  Wu  and  Chow  (2004) 
[4]  extracted  the  total  power,  average  frequency,  and  dispersion  indices  (2nd  and  3rd  central 
moments)  of  the  power  spectra  of  vibration  signals. 

Raw  spectra  for  bearing  vibration  signals  may  not  be  appropriate  for  directly  calculating  features. 
Smoothing  raw  spectra  may  be  necessary  before  calculating  features.  Power  spectral  density 
(PSD)  is  considered  to  be  one  of  the  spectra  smoothing  techniques. 

b)  Envelope  analysis:  Envelope  analysis,  also  known  as  amplitude  demodulation  or  high 
frequency  resonance  technique  (HFRT)  [47],  is  another  widely  used  frequency  domain  technique 
for  bearing  fault  diagnosis.  Envelope  analysis  consists  of  two  steps;  band-pass  filtering  and 
enveloping.  During  bearing  operation,  wide  band  impulses  are  generated  when  rolling  elements 
pass  over  the  defect.  Certain  vibration  modes  of  the  bearing  and  its  supporting  structure  will  be 
excited  by  the  periodic  impulses.  Band-pass  filtering  allows  keeping  only  signal  components 
around  the  resonance  frequency.  Enveloping  is  then  to  remove  the  structural  resonance  and 
preserve  the  defect  impact  frequency.  Thus  envelope  analysis  can  be  used  for  detecting  incipient 
faults  of  bearings.  The  key  for  envelope  analysis  to  be  effective  is  intelligently  selecting 
frequency  band. 

c)  Cepstral  analysis:  Cepstrum,  defined  as  the  power  spectrum  of  the  logarithm  of  the  power 
spectrum  of  the  signal,  is  used  for  detecting  the  periodicity  of  spectra.  A  defect  in  a  bearing 
element  (ball  and  races)  generates  impulses  and  the  bearing  and  its  structure  respond  to  the 
impulses.  Bearing  vibration  signals  thus  are  the  result  of  convolution  between  impulses  and  the 
system’s  response  to  these  impulses,  which  leads  to  harmonic  series  in  the  spectra.  Cepstral 
analysis  is  to  detect  common  spacing  between  the  harmonics.  Cepstmm  analysis  has  been  used 
for  bearing  fault  detection  and  diagnosis,  e.g.,  [48]. 

d)  Higher  order  spectra:  Higher  order  spectra  typically  refer  to  bispectrum  and  trispectrum. 
Higher  order  spectra  are  also  called  higher  order  statistics  since  bisprctrum  and  trispectrum  are 
essentially  the  Eourier  transform  of  the  3'^'*"  and  4*-order  statistics  of  signals.  Higher  order  spectra 
(i.e.,  bispectrum  or  trispectrum)  have  proved  to  have  more  diagnostic  information.  Advantages 
for  using  higher  order  spectra  include  additive  Gaussian  noise  suppression,  non-minimum  phase 
system  identification,  nonlinear  systems  detection  and  identification  [49].  Ei  et  al  [36]  presented 
bicoherence  signal  analysis  for  detection  of  faults  in  bearings.  The  rationale  behind  the 
bicoherence  analysis  is  that  interactive  coupling  between  various  frequencies  and  existing  bearing 
fault  frequencies  can  be  amplified  and  detected  by  monitoring  the  statistical  dependence  or 
correlations  between  the  energies  in  the  corresponding  frequency-combinations. 

3.2.  Non-stationary  signals:  Eor  non-stationary  signals,  since  the  statistical  properties  change 
over  time,  traditional  spectral  analysis  becomes  ineffective.  Techniques  used  for  tackling  non- 
stationary  signals  include  time-frequency  techniques  and  wavelet  analysis,  which  are  detailed  as 
follows. 

3.2.1.  Time-frequency  analysis  techniques:  Time-frequency  analysis  techniques  analyze  signals 
in  both  time  and  frequency  simultaneously  for  identifying  time-dependent  variations  of  frequency 
components  within  the  signal,  which  makes  time-frequency  analysis  techniques  a  powerful  tool 
for  analyzing  non-stationary  signals.  The  most  commonly  used  time-frequency  analysis 
techniques  are  the  short-time  Fourier  transform  (STFT),  the  Wigner-Ville  distribution,  and  the 
wavelet  transform.  In  this  paper  we  categorize  wavelets  as  a  separate  group  due  to  its  popularity 
and  various  types.  Other  newly  developed  time-frequency  analysis  techniques  include  spectral 
kurtosis,  empirical  mode  decomposition,  and  cyclostationary  analysis. 
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a)  Short-time  Fourier  transform  (STFT):  STFT  tackles  non-stationary  signals  by  applying  the 
conventional  FFT  to  a  sliding  window  of  the  signal,  which  can  be  assumed  to  be  locally 
stationary.  The  squared  magnitude  of  the  STFT,  often  referred  as  the  spectrogram,  provides  the 
energy  density  spectrum  of  the  signal  as  a  function  of  time.  Time  resolution  is  determined  by 
segment  length.  Thus  the  success  of  STFT  is  hinged  on  properly  choosing  window  length,  which 
often  time  is  difficult.  Using  STFT  for  bearing  monitoring  and  diagnosis  have  shown  in  many 
publications,  for  example,  [22]. 

b)  Wigner-Ville  distribution:  The  afore-mentioned  STFT  is  conceptually  simple.  However,  this 
simple  scheme  has  a  severe  drawback,  that  is,  it  cannot  provide  good  resolution  simultaneously  in 
both  time  and  frequency  domains.  Wigner-Ville  distribution  [50]  is  a  bilinear  transform,  thus 
does  not  have  the  limitation  of  the  spectrogram.  However,  bilinear  transform  gives  the 
interference  terms  that  make  interpretation  of  the  estimated  distribution  difficult.  The  Choi- 
Williams  time-frequency  distribution  [51]  was  developed  to  overcome  this  disadvantage. 

c)  Spectral  kurtosis:  Spectral  kurtosis  (SK)  was  first  introduced  as  statistical  tool  for  detecting 
the  presence  of  transients  (non-stationary  components)  in  a  signal  and  their  location  in  the 
frequency  domain.  Antoni  and  Randall  [29]  proposed  a  comprehensive  formalization  and 
introduced  it  into  rotating  machine  diagnostics.  They  observed  that  SK  and  power  spectral  density 
(PSD)  are  supplementary  each  other:  PSD  can  be  thought  of  a  measure  of  position  (time- 
average),  whereas  the  SK  as  a  measure  of  dispersion  (time  variance)  of  a  time-frequency  energy 
density  [30].  Be  definition,  SK  is  large  in  frequency  bands  where  the  impulsive  bearing  fault 
signal  is  dominant  and  essentially  zeros  in  the  bands  where  stationary  components  are  dominant. 
SK  has  often  been  used  for  selecting  frequency  bands  for  demodulation  and  filtering  [31].  To  use 
SK  for  feature  extraction,  one  can  simply  follow  what  we  did  in  spectral  analysis  for  feature 
extraction,  that  is,  calculating  kurtosis  at  bearing  characteristic  frequencies.  Other  characteristics, 
such  as  maximum  value,  mean  value,  and  even  shape  statistics,  can  also  be  calculated  and  used  as 
features. 

d)  Empirical  mode  decomposition:  Proposed  by  Huang  et  al  [35],  empirical  mode 
decomposition  (EMD)  is  a  new  time-frequency  domain  signal  analysis  method.  EMD 
decomposes  a  complicated  signal  into  a  finite  number  of  intrinsic  mode  functions  (IMFs).  Each  of 
those  IMEs  can  then  be  analyzed  to  extract  characteristic  information  of  the  original  signal.  In 

[16] ,  so-called  EMD  energy  entropy  calculated  from  each  of  the  IMEs  were  used  as  features  for 
roller  bearing  fault  diagnosis.  They  even  showed  that  EMD-based  features  could  identify  roller 
bearing  fault  patterns  more  effectively  than  those  based  on  wavelet  packet  decomposition  do.  In 

[17] ,  the  non-stationary  vibration  signal  of  a  roller  bearing  was  first  decomposed  by  EMD  into 
IMEs  that  were  stationary.  An  AR  model  was  established  for  each  IME  and  the  AR  parameters 
were  used  as  features  for  bearing  fault  diagnosis. 

e)  Cyclostationary  analysis:  Realizing  that  rolling-element  bearing  vibrations  are 
cyclostationary,  [32]  [33]  introduced  cyclostationary  analysis  (CA)  to  bearing  vibration  analysis 
as  an  alternative  framework  to  other  time-frequency  analysis  methods.  The  center  part  of  CA  is 
its  spectral  correlation  function  (also  called  cyclic  spectral  density  (CSD)),  which  is  obtained  by 
performing  2D  Eourier  transform  of  the  autocorrelation  function  of  the  vibration  signal  with 
respect  to  two  time  variables.  CSD  indicates  how  spectral  content  evolves  periodically  in  time, 
thus  can  be  a  powerful  tool  for  distinguishing  different  types  of  signals  (stationary,  nonstationary, 
and  periodic)  and  can  be  used  for  identifying  the  source  of  faults  [31].  It  has  been  proved  in  [34] 
that  integration  of  cyclic  spectral  density  over  all  frequencies  is  equivalent  to  Eourier  transform  of 
the  mean  squared  signal,  thus  linked  the  integrated  CSD  to  envelope  analysis. 
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f)  Adaptive  signal  processing  techniques:  The  goal  of  an  adaptive  representation  algorithm  is  to 
find  an  approximation  of  a  signal,  in  terms  of  a  given  over-complete  dictionary  of  waveforms, 
that  optimizes  a  desired  characteristic.  A  number  of  methods  for  obtaining  signal  representations 
in  over-complete  dictionaries  have  been  developed  in  recent  years,  for  example,  the  matching 
pursuit  [51]  and  the  basis  pursuit  [19]  [20]. 

3.2.2.  Wavelet  Analysis  techniques:  Wavelets  can  be  used  to  perform  multiresolution  analysis 
of  the  bearing  vibration  signal,  which  involves  application  of  a  cascade  of  adjacent  band-pass 
filters  to  the  signal.  This  ability  is  useful  in  assessing  the  signal  content  at  varying  frequencies. 
For  the  primary  problem  of  detection,  an  increase  in  the  energy  of  the  high  frequency  signal  can 
often  indicate  the  presence  incipient  faults  due  to  early  spalls  as  well  as  lubrication  problems. 

Multiresolution  analysis  is  also  useful  for  detecting  the  presence  of  bearing  defect  frequencies. 
Bearing  defect  frequencies  result  from  the  periodic  impacts  of  the  defective  component;  these 
impacts  can  transfer  energy  across  a  wide  band  of  resonance  frequencies.  Since  multiresolution 
analysis  using  wavelets  preserves  time  information  as  well,  all  of  the  time-domain  techniques  and 
features  can  be  applied  to  the  signal  constructed  at  an  appropriate  resolution. 

Features  based  on  wavelets  include  values  of  wavelet  coefficients,  resolution-specific  energy 
content.  The  ability  to  decompose  a  signal  into  components  at  varying  frequencies  also  has  the 
advantage  for  discriminating  multiple  types  of  faults  since  the  contribution  of  each  fault  can  often 
be  different  at  different  frequencies.  Use  of  wavelet  transform  to  extract  defect  features  from 
vibration  signals  have  been  examined  by  various  early  works  in  literature,  Cheung  et  al  [45], 
Mori  et  al  [42],  Li  et  al  [36],  Yang  et  al  [43],  and  Staszewski  et  al  [44],  to  name  a  few.  Peng  and 
Chu  [24]  conducted  a  review  on  application  of  wavelet  transform  in  machine  condition 
monitoring  and  fault  diagnostics.  More  recent  works  focus  on  the  use  of  wavelets  that  are 
customized  to  either  the  bearing  signature  or  to  localize  analyses  on  a  resonant  band  with  highest 
sensitivity,  as  well  as  from  the  supervised  learning  perspective,  as  in  the  embedding  of  wavelets 
into  neural  networks. 

Wavelets  are  also  useful  for  transient  analysis  of  signals  and  therefore  for  detection  of  faults  that 
are  amplified  by  analysis  of  transient  vibration  signals  Wang  and  McFadden  [37]  and  Wang  [38] 
presented  techniques  for  use  of  time-frequency  representation  in  the  analysis  of  transient 
vibration  signals.  Sahambi  et  al  [40]  presented  the  use  of  wavelets  to  characterize 
electrocardiograms  (ECG)  for  online  detection  of  relevant  timing  intervals  in  the  ECG  events  that 
can  be  used  for  better  interpretation  of  ECG  signals.  Holm-Hansen  et  al  [41]  presented  the  use  of 
the  actual  impulse  response  of  a  ball  bearing  to  construct  a  customized  wavelet  analytically  and 
use  it  for  the  detection  of  defects  in  the  bearing.  The  customized  wavelet  approach  is  shown  to 
provide  better  sensitivity  in  the  detection  of  bearing  induced  signatures  compared  to  other 
standard  mother  wavelets  for  the  same  analysis. 

Shi  et  al  [39]  presented  a  wavelet-based  technique  to  improve  the  sensitivity  of  the  traditional 
enveloping.  The  make  use  of  Shannon  Entropy  of  wavelet-based  spectra  to  identify  the  optimal 
scale  and  thus  optimal  resonance  frequency  to  monitor  bearing  condition. 

Wavelet  neural  networks  (WNNs)  [46]  provide  a  mechanism  to  avoid  explicit  extraction  of 
features  from  the  wavelet  transforms.  Rather,  it  is  a  technique  by  which  the  informative  features 
can  be  tuned  under  the  adaptive  learning  capabilities  of  classical  neural  networks  to  tune  the 
parameters  of  the  mother  wavelet  to  best  separate  normal  signals  from  known  faulty-containing 
signals. 
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Several  types  of  features  can  be  extracted  from  wavelet-based  methods,  which  can  be 
categorized  roughly  into  wavelet  coefficient-based,  wavelet  energy-based,  singularity-based,  and 
wavelet  function-based  methods  [21]. 

3.3.  Other  techniques:  In  addition  to  afore-mentioned  feature  extraction  techniques,  there  are 
other  methods  that  utilize  computation  intelligence  techniques  for  constructing  features,  for 
example,  in  [6]  [7],  genetic  programming  (GP)  was  used  to  constract  better  features.  Other 
statistical  transformation  methods,  such  as  principal  component  analysis  (PCA)  and  linear 
discriminant  analysis  (LDA),  can  also  be  used  for  constructing  higher-level  features  out  of  the 
original  features. 

4.  Conclusions:  Identifying  a  set  of  salient  signatures/features  has  always  been  an  important  and 
challenging  task  in  multiple  fields,  such  as,  machine  learning,  pattern  recognition,  and  data- 
mining.  Extracting  good  features  from  sensor  measurements  is  also  critical  in  design  of  bearing 
PHM  systems.  With  increasing  demand  for  more  advanced  bearing  PHM  technologies  and 
continuously  increasing  research  attention  to  feature  extraction  technologies,  a  large  number  of 
feature  extraction  techniques  have  been  explored.  This  paper  attempts  to  survey  some  of  those 
feature  extraction  techniques,  especially  the  recent  developments.  Even  though  our  survey  is  not 
meant  to  be  exhaustive,  we  hope  that  this  work  will  be  helpful  to  those  who  are  interested  in 
feature  extraction  in  general  and  to  those  who  are  involved  in  choosing  proper  feature  extraction 
methods  for  their  own  applications. 
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